51 research outputs found

    DeepEval: An Integrated Framework for the Evaluation of Student Responses in Dialogue Based Intelligent Tutoring Systems

    Get PDF
    The automatic assessment of student answers is one of the critical components of an Intelligent Tutoring System (ITS) because accurate assessment of student input is needed in order to provide effective feedback that leads to learning. But this is a very challenging task because it requires natural language understanding capabilities. The process requires various components, concepts identification, co-reference resolution, ellipsis handling etc. As part of this thesis, we thoroughly analyzed a set of student responses obtained from an experiment with the intelligent tutoring system DeepTutor in which college students interacted with the tutor to solve conceptual physics problems, designed an automatic answer assessment framework (DeepEval), and evaluated the framework after implementing several important components. To evaluate our system, we annotated 618 responses from 41 students for correctness. Our system performs better as compared to the typical similarity calculation method. We also discuss various issues in automatic answer evaluation

    Measuring Semantic Textual Similarity and Automatic Answer Assessment in Dialogue Based Tutoring Systems

    Get PDF
    This dissertation presents methods and resources proposed to improve onmeasuring semantic textual similarity and their applications in student responseunderstanding in dialogue based Intelligent Tutoring Systems. In order to predict the extent of similarity between given pair of sentences,we have proposed machine learning models using dozens of features, such as thescores calculated using optimal multi-level alignment, vector based compositionalsemantics, and machine translation evaluation methods. Furthermore, we haveproposed models towards adding an interpretation layer on top of similaritymeasurement systems. Our models on predicting and interpreting the semanticsimilarity have been the top performing systems in SemEval (a premier venue for thesemantic evaluation) for the last three years. The correlations between our models\u27predictions and the human judgments were above 0.80 for several datasets while ourmodels being very robust than many other top performing systems. Moreover, wehave proposed Bayesian. We have also proposed a novel Neural Network based word representationmapping approach which allows us to map the vector based representation of a wordfound in one model to the another model where the word representation is missing,effectively pooling together the vocabularies and corresponding representationsacross models. Our experiments show that the model coverage increased by few toseveral times depending on which model\u27s vocabulary is taken as a reference. Also,the transformed representations were well correlated to the native target modelvectors showing that the mapped representations can be used with condence tosubstitute the missing word representations in the target model. models to adapt similarity models across domains. Furthermore, we have proposed methods to improve open-ended answersassessment in dialogue based tutoring systems which is very challenging because ofthe variations in student answers which often are not self contained and need thecontextual information (e.g., dialogue history) in order to better assess theircorrectness. In that, we have proposed Probabilistic Soft Logic (PSL) modelsaugmenting semantic similarity information with other knowledge. To detect intra- and inter-sentential negation scope and focus in tutorialdialogs, we have developed Conditional Random Fields (CRF) models. The resultsindicate that our approach is very effective in detecting negation scope and focus intutorial dialogue context and can be further developed to augment the naturallanguage understanding systems. Additionally, we created resources (datasets, models, and tools) for fosteringresearch in semantic similarity and student response understanding inconversational tutoring systems

    Assessing student response in tutorial dialogue context using probabilistic soft logic

    No full text
    Automatic answer assessment systems typically apply semantic similarity methods where student responses are compared with some reference answers in order to access their correctness. But student responses in dialogue based tutoring systems are often grammatically and semantically incomplete and additional information (e.g., dialogue history) is needed to better assess their correctness. In that, we have proposed augmenting semantic similarity based models with, for example, knowledge level of the student and question difficulty and jointly modeled their complex interactions using Probabilistic Soft Logic (PSL). The results of the proposed PSL models to infer the correctness of the given answer on DT-Grade dataset show the more than 7% improvement on accuracy over the results obtained using a semantic similarity model

    DT-Neg: Tutorial dialogues annotated for negation scope and focus in context

    No full text
    Negation is often found more frequent in dialogue than commonly written texts, such as literary texts. Furthermore, the scope and focus of negation depends on context in dialogues than other forms of texts. Existing negation datasets have focused on non-dialogue texts such as literary texts where the scope and focus of negation is normally present within the same sentence where the negation is located and therefore are not the most appropriate to inform the development of negation handling algorithms for dialogue-based systems. In this paper, we present DT-Neg corpus (DeepTutor Negation corpus) which contains texts extracted from tutorial dialogues where students interacted with an Intelligent Tutoring System (ITS) to solve conceptual physics problems. The DT-Neg corpus contains annotated negations in student responses with scope and focus marked based on the context of the dialogue. Our dataset contains 1,088 instances and is available for research purposes at http://language.memphis.edu/dt-neg

    Latent semantic analysis models on wikipedia and TASA

    No full text
    This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire English Wikipedia and the TASA corpus. The models differ not only on their source, Wikipedia versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Our experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used measures, and comparable to the state of the art

    Dialogue act classification in human-to-human tutorial dialogues

    No full text
    We present in this paper preliminary results with dialogue act classification in human-to-human tutorial dialogues. Dialogue acts are ways to characterize the actions of tutors and students based on the language-as-action theory. This work serves our larger goal of identifying patterns of tutors’ actions, in the form of dialogue acts, that relate to learning. The preliminary results we obtained for dialogue act classification using a machine learning approach are promising

    Similarity measures based on Latent Dirichlet Allocation

    No full text
    We present in this paper the results of our investigation on semantic similarity measures at word- and sentence-level based on two fully-automated approaches to deriving meaning from large corpora: Latent Dirichlet Allocation, a probabilistic approach, and Latent Semantic Analysis, an algebraic approach. The focus is on similarity measures based on Latent Dirichlet Allocation, due to its novelty aspects, while the Latent Semantic Analysis measures are used for comparison purposes. We explore two types of measures based on Latent Dirichlet Allocation: measures based on distances between probability distribution that can be applied directly to larger texts such as sentences and a word-to-word similarity measure that is then expanded to work at sentence-level. We present results using paraphrase identification data in the Microsoft Research Paraphrase corpus. © 2013 Springer-Verlag

    Automated assessment of open-ended student answers in tutorial dialogues using Gaussian Mixture Models

    No full text
    Open-ended student answers often need to be assessed in context. However, there are not many previous works that consider context when automatically assessing student answers. Furthermore, student responses vary significantly in their explicit content and writing style which leads to a wide range of assessment scores for the same qualitative assessment category, e.g. correct answers vs. incorrect answers. In this paper, we propose an approach to assessing student answers that takes context into account and which handles variability using probabilistic Gaussian Mixture Models (GMMs). We developed the model using a recently released corpus called DT-Grade which was manually annotated, taking context into account, with four different levels of answer correctness. Our best GMM model outperforms the baseline model with a margin of 9% in terms of accuracy

    A Sentence similarity method based on chunking and information content

    No full text
    This paper introduces a method for assessing the semantic similarity between sentences, which relies on the assumption that the meaning of a sentence is captured by its syntactic constituents and the dependencies between them. We obtain both the constituents and their dependencies from a syntactic parser. Our algorithm considers that two sentences have the same meaning if it can find a good mapping between their chunks and also if the chunk dependencies in one text are preserved in the other. Moreover, the algorithm takes into account that every chunk has a different importance with respect to the overall meaning of a sentence, which is computed based on the information content of the words in the chunk. The experiments conducted on a well-known paraphrase data set show that the performance of our method is comparable to state of the art. © 2014 Springer-Verlag Berlin Heidelberg

    On paraphrase identification corpora

    No full text
    We analyze in this paper a number of data sets proposed over the last decade or so for the task of paraphrase identification. The goal of the analysis is to identify the advantages as well as shortcomings of the previously proposed data sets. Based on the analysis, we then make recommendations about how to improve the process of creating and using such data sets for evaluating in the future approaches to the task of paraphrase identification or the more general task of semantic similarity. The recommendations are meant to improve our understanding of what a paraphrase is, offer a more fair ground for comparing approaches, increase the diversity of actual linguistic phenomena that future data sets will cover, and offer ways to improve our understanding of the contributions of various modules or approaches proposed for solving the task of paraphrase identification or similar tasks. We also developed a data collection tool, called Data Collector, that proactively targets the collection of paraphrase instances covering linguistic phenomena important to paraphrasing
    corecore